File paths and working directories

A character string that tells you the location of a file
Absolute path: starts from the “root” directory
- e.g. /Users/kjytay/Downloads/datafile.csv
Relative path: starts from the current directory (denoted by .)
- e.g. If I am in the folder /Users/kjytay: ./Downloads/datafile.csv
- e.g. If I am in the folder /Users/kjytay/Downloads: ./datafile.csv or simply datafile.csv

File paths and working directories

Working directory: where R looks for files that you ask it to load
Also where R will put any files that you ask it to save
You can see your current working directory at the top of the console or by typing getwd()
You can change the working directory with setwd() function or Session > Set Working Directory > …

Factors

A concept unique to R
Useful for working with categorical variables: variables that have a fixed and known set of possible values
Why use factor variables instead of character variables?
- Character variables don’t protect you from typos
- Character variables don’t sort in a useful way

Functions for factors

fct_recode(): change factor levels
fct_collapse() and fct_lump(): reduce the number of factor levels
fct_infreq(): to sort factor levels by how often they appear
fct_reorder(): to sort factor levels by some other variable
fct_rev(): reverse the order of the factor levels

All these functions are part of the forcats package, which is automatically loaded when you load the tidyverse package.

Agenda for today

Reproducible research
- R scripts
- R markdown

Reproducible research: what & why

Reproducible research: publishing data analyses together with their data and code so that others may “reproduce” the findings.

Why reproducible research?

Increase transparency and robustness of analyses
Preserve integrity of analyses over time
Reduce incentive for dishonest practices

R scripts

An R script is a file containing lines of R code that are meant to be run altogether
R scripts are typically working files, not intended for presentation
R scripts have .R file extensions
Comments can be inserted to explain the code

R markdown

RStudio: R markdown is a document format which allows you to “weave together narrative text and code to produce elegantly formatted output.”

Made possible by the knitr package (Yihui Xie)

(Source: Vimeo)

R markdown: output (1)

(Source: Github: kjytay/FIFA-world-cup-2018)

R markdown: output (2)

(Source: Github: kjytay/FIFA-world-cup-2018)

R markdown: output (3)

(Source: Github: kjytay/FIFA-world-cup-2018)

R markdown: more details

Text (written in Markdown), interspersed with code chunks, “knit” into a document using the knitr package
Typically used for presentation
R markdown files have .Rmd extensions
R markdown cheatsheet and reference guide available here

Surprise: (Almost) all the class material (including slides) was created with R markdown!

Quick intro to Markdown

Markdown is a simple way to convert a text document into a web file (i.e. HTML) with basic styling.

Has support for:

Headers
Emphasis (italics, bold, ~~strikethrough~~)
Lists
Links
Images
Etc…

Markdown reference here.

To see how your Markdown (.md) document looks like in real-time, use an online Markdown editor (e.g. dillinger.io)

Today’s dataset: Airbnb listings

(Source: Hotel Technology News)

Optional material

Rmd workflow (basic)

Edit .Rmd file in RStudio.
Knit the document (either by hitting the “Knit” button or using a keyboard shortcut).
- When you press “Knit”, the file is automatically saved.
- Next, RStudio opens a new console, “knits” the document there, then closes that console. No code is run in your original console!
- RStudio creates a .html file in the same folder as the .Rmd file.
Preview output in the preview pane, or by opening the .html file.
- If you want to make changes, go back to Step 1.

Common Rmd chunk options

include = FALSE: prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
- Useful for decluttering your Rmd output, showing only essential code.
echo = FALSE: prevents code, but not the results from appearing in the finished file.
- Useful if you just want to show figures but not code that generated it.
eval = FALSE: Code appears in the output but is not run.
- Useful for presenting code for demonstration purposes.
message = FALSE: prevents messages that are generated by code from appearing in the finished file.
- Useful for suppressing messages when loading packages.
warning = FALSE: prevents warnings that are generated by code from appearing in the finished.
- Useful for suppressing warnings when loading packages, plotting data or fitting models.

STATS 32 Session 8: Reproducible research

Recap of session 7

File paths and working directories

File paths and working directories

Factors

Functions for factors

Agenda for today

Reproducible research: what & why

R scripts

R markdown

R markdown: output (1)

R markdown: output (2)

R markdown: output (3)

R markdown: input

R markdown: more details

Quick intro to Markdown

Today’s dataset: Airbnb listings

Rmd workflow (basic)

Common Rmd chunk options